Not All Sequence Tags Are Created Equal: Designing and Validating Sequence Identification Tags Robust to Indels
نویسندگان
چکیده
Ligating adapters with unique synthetic oligonucleotide sequences (sequence tags) onto individual DNA samples before massively parallel sequencing is a popular and efficient way to obtain sequence data from many individual samples. Tag sequences should be numerous and sufficiently different to ensure sequencing, replication, and oligonucleotide synthesis errors do not cause tags to be unrecoverable or confused. However, many design approaches only protect against substitution errors during sequencing and extant tag sets contain too few tag sequences. We developed an open-source software package to validate sequence tags for conformance to two distance metrics and design sequence tags robust to indel and substitution errors. We use this software package to evaluate several commercial and non-commercial sequence tag sets, design several large sets (max(count) = 7,198) of edit metric sequence tags having different lengths and degrees of error correction, and integrate a subset of these edit metric tags to polymerase chain reaction (PCR) primers and sequencing adapters. We validate a subset of these edit metric tagged PCR primers and sequencing adapters by sequencing on several platforms and subsequent comparison to commercially available alternatives. We find that several commonly used sets of sequence tags or design methodologies used to produce sequence tags do not meet the minimum expectations of their underlying distance metric, and we find that PCR primers and sequencing adapters incorporating edit metric sequence tags designed by our software package perform as well as their commercial counterparts. We suggest that researchers evaluate sequence tags prior to use or evaluate tags that they have been using. The sequence tag sets we design improve on extant sets because they are large, valid across the set, and robust to the suite of substitution, insertion, and deletion errors affecting massively parallel sequencing workflows on all currently used platforms.
منابع مشابه
P-215: Discovery of A Novel APA Variant of A Human Potential Gene Based on Expressed Sequenced Tags Analysis
Background: Expressed sequence tags (ESTs) are sequences of cDNA fragments prepared from different tissue sources. There are over one million of these sequences in the publicly available database, and these sequences are believed to represent more than half of all human genes. The ESTs belong to different cDNA libraries, was prepared from one particular cell type, organ, or tumor. Therefore, th...
متن کاملهمسانهسازی و بیان ایمونوتوکسین اونتاک به صورت هیبریدی با دنباله اینتئینی
Introduction: Inteins (INT) are internal parts of a number of proteins in yeast and some other unicellular eukaryotes, which can be separated from the immature protein during protein splicing process. After identifying the mechanism of intein action, applications of these sequences are be considered in the single- step purification of recombinant proteins and different intein tags were develope...
متن کاملبررسی میزان تطابق زبان نمایهسازان، نویسندگان و برچسبگذاران در پایگاه اطلاعاتی اریک و مندلی
Objective: The purpose of this study was to identify the language consistency between indexers, authors and taggers in the ERIC and Mendeley databases. Methodology: This survey was conducted using content analysis methods and techniques to evaluate the language consistency between indexers, authors and taggers in the ERIC and Mendeley databases and also to determine common keywords. The sample ...
متن کاملافزایش سرعت شناسایی در سیستمهای RFID
Radio frequency identification (RFID) is a new generation of automatic identification systems, based on wireless communication technology. In these systems all the tags using one communication channel to communicate with the reader. When two or more tags transmit their data to the reader simultaneously, their transmitted signals will collide. Resolving this collision has a direct impact on the ...
متن کاملIdentification of Novel Hypoxia Response Genes in Human Glioma Cell Line A172
Objective(s): Hypoxia is a serious challenge for treatment of solid tumors. This condition has been manifested to exert significant therapeutic effects on glioblastoma multiform or (WHO) astrocytoma grade IV. Hypoxia contributes numerous changes in cellular mechanisms such as angiogenesis, metastasis and apoptosis evasion. Furthermore, in molecular level, hypoxia can cause induction of DNA br...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 7 شماره
صفحات -
تاریخ انتشار 2012